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Abstract 

A degree-d polynomial p in n variables over a field F is equidistributed if it takes on each 
of its |F| values close to equally often, and biased otherwise. We say that p has a low rank 
if it can be expressed as a bounded combination of polynomials of lower degree. Green and 
Tao [GT07] have shown that bias imply low rank over large fields (i.e. for the case d < |F|). 
They have also conjectured that bias imply low rank over general fields. In this work we 
affirmatively answer their conjecture. Using this result we obtain a general worst case to average 
case reductions for polynomials. That is, we show that a polynomial that can be approximated 
by few polynomials of bounded degree, can be computed by few polynomials of bounded degree. 
We derive some relations between our results to the construction of pseudorandom generators, 
and to the question of testing concise representations. 
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1 Introduction 



Let F be a prime finite field. Let p : F n — > F be a polynomial in n variables over F of degree at most 
d . We say that p is equidistributed if it takes on each of its |F| values close to equally often, and 
biased otherwise. We say that p has a low rank if it can be expressed as a bounded combination 
of polynomials of lower degree, and high rank otherwise. More formally we consider the following 
definitions. 

Definition 1 (bias). The bias of a function / : F n — ► F is defined to be 

bias(f) =E x& n[ufW] 

where u stands for the |F| root of unity, i.e. u = e^f 

Definition 2 (rank). Let p(X) be a degree d polynomial over ¥ n . rankd-i(P) is the smallest 
integer k such that there exist degree d — 1 polynomials qi(X), qk(x), and a function F : ¥ k —> F, 
s.t. p(X)=F( qi (X),...,q k (X)). 

Green and Tao [GT07J have shown that over large fields bias imply low rank. 

Theorem 1 (Theorem 1.7 in [GT07] ). Let p(X) be a degree d polynomial over¥ n , where d < \¥\. 
Ifbias{p) > 5 > 0, then rank^-iip) < c(e>, d). 

In their paper, Green and Tao conjecture that the restriction d < \¥\ can be removed, but their proof 
technique breaks down when d > \¥\. Note that over large fields things might behave differently 
than over small fields. One important example is the The Inverse Conjecture for the Gowers Norm. 
This conjecture roughly says that if the (i-derivative of a polynomial is biased then that polynomial 
has a non-negligible correlation with some polynomial of degree d — 1. The Inverse Conjecture for 
the Gowers Norm was proven to be true over large fields by |GT 07] . but was proven to be false 
over small fields [GT071 ILMS] . One of the main tools used for proving the conjecture over large 
fields was Theorem [H that was proven over large fields. 

One could ask what is the case with the above theorem, whether it remains true over smaller 
fields or it becomes false there. We show that the [GT07] result is true over general fields. In this 
respect, as opposed to the Inverse Conjecture for the Gowers Norm case, large and small fields 
behave similarly. 

1.1 Our Main Results 

Our first main theorem shows that bias imply low rank over general fields. 

Theorem 2 (Bias imply low rank for general fields ). Let p(X) be a degree d polynomial over ¥ n , 
s.t. bias(p) > 5 > 0. Then rankd-\{p) < c(d, 5). That is, there exist degree-(d — 1) polynomials 
qi(X), ...,q c (x), and a function F : ¥ c — > F, s.t. p(X) = F(qi(X), q c (X)), and c = c(d,5). 
Moreover, qx, q c are derivatives of the form p(X + a) — p(X) where a £ F n . 

Most of the technical part of the paper is dedicated to proving Theorem [2j The proof will go by 
induction on the degree d of p(X). Notice that for d = 1 it holds trivially. So, we assume Theorem[2] 
to hold for all degrees smaller than d, and prove it for degree d. 
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Our second main theorem is obtained as a corollary from Theorem [2J The theorem is a worst case 
to average case reduction for polynomials. It says that a polynomial that can be approximated by 
few polynomials of bounded degree, can be computed by few polynomials of bounded degree. We 
now move to define and prove this rigorously. 

Definition 3 (^-approximation). We say a function / : F n — > F (5-approximates p(X) if: 

|E XeF n[cu p(x) - /(x) ]| > 5 

Theorem 3 (Worst-case to average case reduction for polynomials of bounded degree). Let p(X) 
be a polynomial of degree d, gi,...,g c polynomials of degree k, (d,c,k = 0(1) ) and F : F c — > F a 
function s.t. the composition G(x) = F(g\(X), g c (X)) 5-approximates p. Then there exist c' 
polynomials h\, ...,h c i and a function F' : F c — > F s.t. 

F'(hi(X),...,h c ,(X))=p(X) 

Moreover, c' = c'(d,c,k) (i.e. independent of n) and each hi if of the form p(X + a) — p(X) or 
gj(X + a) for a £ F n . In particular, if k < d — 1 then also deg(/ij) < d — 1. 

Proof. Develop uj F<yZ1, "' ,z ^ : F c — > C in the Fourier basis. If F(gi(X), g c (X)) 5- approximates 
p(X), there must exist some Fourier coefficient which 5'- approximates p (5' > <5|F| _C ). That means, 
there exist an, a c £ F s.t. the polynomial 

p'{x) = p(x) - (aigi(x) + ...a c g c (x)) 

has bias at least 5' . Using Theorem [2] we get that there must exist at most c' derivatives of p' which 
computes it. We can now use them and a±g± + ...a c g c to compute p. □ 

2 Significance of Results 

Bias imply low rank over general fields. Out first main theorem (Theorem [2]) shows that 
over general fields there is a phenomena that bias imply low rank. Green and Tao [GT07] proved 
this for large fields. They left the case of small fields open. We answer their question affirmatively 
by showing that the "bias imply low rank" phenomena is robust and holds for all fields. 

Worst case to average case reductions for polynomials. Our second main theorem (The- 
orem [3]) shows that every polynomial, not necessarily biased, that is approximated by few other 
bounded degree polynomials, can be computed by few bounded degree polynomials. We view this 
result as a worst case to average case reduction for polynomials. I.e. in order to show that a 
polynomial can not be approximated by few bounded degree polynomials, it would be sufficient 
to show that the polynomial can not be calculated by few bounded degree polynomials. That 
later task might be easier. An example when such a scenario is relevant is the following. The 
papers [GT071 ILMS| that disprove the Inverse Conjecture for the Gowers Norm needed to show 
that the symmetric polynomial £4 over F2, i.e. Si(xi, ...,x n ) = X^i<j<fc</ 

not be 

approximated by few degree 3 polynomials. Given the current result it could be sufficient (and 
maybe easier?) to show that £4 can not be computed by few degree 3 polynomials. 
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On the power of induction and relation to pseudorandom generators. Pseudorandom 
generator for polynomials of degree-d is an efficient procedure that stretches s field elements into 
n 3> s field elements that can fool any polynomial of degree d in n variables. Pseudorandom 
generators are mostly interesting over small fields. One can use our first main theorem to provide 
an alternative proof to the correctness of the pseudorandom generators of |BVj that fool degree d 
polynomials. The argument used by [BV| relied on the Gower Inverse Conjecture which turned 
out to be false for small fields |GT07[ ILMS] . However, a better inspection of the [BV| argument 
shows that for proving the correctness of their pseudorandom generators one can only rely on the 
statement that "bias imply low rank" which we prove here. 

The "bias imply low rank" idea suggests a robust way to construct pseudorandom generators for 
some complex function classes based on pseudorandom generators for simpler function classes. The 
last would be done by the following methodology. Either you are unbiased in which case you could 
fool whoever you wanted to fool, or you are a function of few functions of lower complexity, so by 
induction we obtain a construction of pseudorandom generator for functions of higher complexity 
classes (e.g. degree d polynomials) given pseudorandom generators for functions of lower complexity 
classes (e.g. linear functions). 

Relation to testing concise representations. Diakonikolas et al. [DLMORSW] suggest a 
general methodology to test whether a function on n variables has a concise representation. The 
idea is to do testing by implicit learning. Their work provides property testers for several concise 
structures among them are s-sparse polynomials, size-s algebraic circuits and more. Consider the 
following concise representation of degree d polynomials. A polynomial of degree d has a concise 
representation if it is a function of few polynomials of lower degree (i.e. if it has a low rank). 
We argue that one can use the "bias imply low rank" theorem in order to construct a tester that 
test for this concise representation. The tester first performs a low degree testing e.g. by [RS] to 
test that the given polynomial is of degree at most d, if the degree-tester rejects the tester rejects 
otherwise, the tester would approximate the bias of the polynomial. If the bias is large then by 
our first theorem the polynomial has low rank and the tester accepts, otherwise it rejects. The 
idea behind our approach for testing this concise representation of polynomials is robust in the 
following sense. It suggests a methodology for testing concise representation of some family (e.g 
depth d circuits) given a membership tester for that family, and given that the family obeys the 
"bias imply low rank" principle. If these two conditions are met, one can construct a tester. The 
tester first test membership in the family and then estimate the bias. If the bias is high the rank 
is low and concise representation exists. 

Extension to tensors Let L(x, y) be a bilinear form over F n , i.e. a function of the form 

L(x, y) = x l Ay 

where x,y £ ¥ n and A is a matrix. There is a close connection between the rank of the matrix 
and the bias of L. Dixon's Theorem ( [MS] ) tells us that the bias of L (and in fact, all non-zero 
Fourier coefficients of L) has absolute value c {¥)~ rank<yA+A >. The theory of higher dimensional 
multilinear forms, i.e. tensors, is much less understood. In particular, there is no single notion of 
tensor rank. We prove, as a direct corollary of Theorem [21 that if we define the rank of a tensor as 
minimal number of lower degree multilinear forms needed to compute it, then bias imply low rank 
for tensors. 
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Theorem 4. Let L(X±, ...,X d ) be a multilinear form of degree d s.t. bias(L) > 5 > 0. Then, there 
exist degree-(d — 1) multilinear forms qi,...,q c , each operating on d — 1 variables out of X\, X d , 
and a function F : F c — ► F, s.t. 

L(X±, ...,X d ) = F(qi(Xi, ...,X tl -i, X tl+ i, ...,X d ), q c (X±, ...,X tc -x,X tc +i, X d )) 

and c = c(d,5). Moreover, qi,...,q c are derivatives of L. 

Proof. We use Theorem [2] on L as a degree d-polynomial, and observe that derivatives of L are 
sums of d degree-(d — 1) multilinear forms in d — 1 variables of X\, X d . □ 

2.1 Proof Overview 

The proof starts by a lemma of Bogdanov and Viola, showing that if a degree-d polynomial p(X) has 
bias, then we can build a constant-size circuit which approximates it, whose inputs are degree-(<i— 1) 
polynomials (and in fact derivatives of p). 

The technical heart of the paper is the proof of the following statement (Lemma [TT]) : A biased 
polynomial of degree d that is approximated by few degree d — 1 polynomials can be computed by 
few degree d — 1 polynomials. 

In general our proof structure is similar in spirit to that of [GT07J, however, there is a clear 
distinction between the two approaches that enable us to obtain the stronger result. The proof is 
by induction on the degree. An important notion in the proof is the following definition of a factor. 

Definition 4 (Factor). A factor is a set of polynomials gi,...,g m : F n — ► F. The number of 
polynomials in the set m is the dimension of the factor. The maximum degree is the order of the 
factor. The polynomials of the factor divide the hyper-cube into |F| m parts according to their joint 
image. Each such part is called a region. 

All regions are of almost same size. The first step of the proof shows that given a set of 
polynomials of degree at most d — 1 that approximates a degree-d polynomial p, i.e. given a 
factor that approximates p, one can transform that factor into another factor of constant size that 
approximate p in in which all of the regions are roughly of the same size. In order to obtain that 
we need the following definition of regularity. 

Definition 5 (Regularity - informal). A factor is regular if the joint distribution of its polynomials 
is close to uniform, see formal definition in Definition [8j 

The Regularity Lemma (see Lemma [5]) shows that given a factor of constant dimension that ap- 
proximate p, it can be transformed into another constant dimension factor which is regular and 
approximate p. Moreover, in a regular factor all regions are roughly of the same size (see Lemma [T2l) . 

Averaging arguments. Since we know that all regions are roughly of same size we can use 
averaging arguments to claim that most regions have large agreement with the polynomial p. These 
are denoted as almost good regions. The rest of the regions are denoted as bad regions but there 
are only few of them. We then show that almost good regions are good (i.e. they fully agree with 
p, see Lemma [T3|) . We further show that p must be fixed/constant over bad regions (Lemma I15p . 
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Hence we get that the polynomial p is a function of the factor, and can be computed (and not only 
approximated) by the factor. So the heart of the proof is to show that almost good regions are 
good and that p is constant on the few bad regions. 

Almost good regions are good, and p is fixed on the rest - a wishful scenario. For a 

set of variables Yi, YJj, ... S F n we denote by Yj = J2iei^i- Since p(x) is a polynomial of degree d 
it satisfies cube constraints of the following form: 

IC[rf+l],|/|>Q 

We show that for every point x that belongs to an almost good region, there is a constraint that 
pass through it and all its other points (i.e. points of the form x + yi for \I\ > 0) belong to the 
good part of the same region. As all the values p(x + yj) = c for all good points in the region, we 
get that also p(x) = c. Hence we get that the value p(x) is constant on the region for every x. 

A question of interest here is given x what is the probability that x + yj is in the region of x for 
every I. Since the assignment of points to the region is determined by the values of the polynomials 
9li • • ■ -,9m that compose the factor, saying that for every I, x + yi is in the region of x is equivalent 
to the following condition. 

\gi{x) = gi(x + yi) for all i G [m] and / C [d + 1]] 

One can observe that for this condition to hold, due to the dependence between derivatives, it is 
sufficient to require the following. 

[di(x) = 9i(x + yi) for all i G [to] and / C [d + 1] s.t. 1 < |/| < deg(gj)] 

Note that this condition by itself is not sufficient to ensure that p(x) is assigned the value of the 
good part of the region. One need also to add the requirement that non of the x + yj fall in the 
bad part of the almost good region. Once making the calculations (Lemma [8]) one can realize 
that if all the events that compose the above conditions were independent then p{x) would have 
get the correct value. Hence we could have say that all the almost good regions are totally good 
(Lemma 113ft . So, p agrees with the factor on all the regions but few. Given this, we show that p 
is fixed also on the bad regions (Lemma [To]) However, the above arguments work under a wishful 
assumption that all the considered events are independent. Much of the technical effort of this 
work goes into showing that the joint distribution of these events is close to being uniform, i.e. the 
events are almost independent. 

Obtaining almost independence through interpolation. One way to prove the almost- 
independent argument is to relate the bias from independence to the bias of the (i-derivatives of 
the giS that compose the factor. Using interpolation one can relate the bias of the d-derivative 
of the gt's to the bias of the gi which should be small by the assumption about the regularity of 
the factor. The use of interpolation in the above argument is absolutely crucial, and that what 
allow |GT07| to get a result only for for the case d < |F|. 
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Our approach for dealing with general fields. In order to eliminate the need for interpolation 
that could hold only over large fields and in order to be able to claim that p is indeed fixed on 
bad regions we define a stronger notion of regularity. The stronger regularity roughly requires 
uniformity of the (f-derivatives of the polynomials in the factor 

Definition 6 (Strong Regularity - informal). A factor is strongly regular if the joint distribution 
of the ^-derivatives its polynomials is close to uniform, see formal definition in Definition fTTT 

Based on this stronger regularity we prove counting lemmas (Lemmas [6] and [7|) that enable us to 
get almost independence in the above sense without the need for interpolation. Thus, we obtain our 
results for general fields. In the following we discuss the usefulness of strong regularity. Using the 
strong regularity we allow all polynomial gi, ...,g m to participate in the calculation of gi(x + 
in contrast to the definition of Green and Tao which required only evaluations of the same gi . This 
gives raise to the notion of an "independence degree" of a polynomial A(<?j) (which is between 1 
and deg(gi), and could be strictly lower than deg(gj) given the other derivatives in the factor). 
Thus, instead of requiring a uniform joint distribution over the following set of events as Green and 
Tao do: 

[gi(x) = gi(x + yj) for all i S [m] and / C [d + 1] s.t. 1 < |/| < deg(<?j)] 
We could only required uniform joint distribution over a subset of the events, that is over: 

\gi(x) = gi(x + yi) for all i G [m] and / C [d+l] s.t. 1 < |I| < A(gi)] 

It turns out that this stronger notion of independence allows us to get almost independence between 
the variables, without the need of integration, which makes our result work for every field. 

In the following we show an example that A(gj) could be strictly smaller then deg(^j). 
Example 1. Consider the symmetric polynomial S4 over F2, i.e. 

X{X jXfcXi 

i<j<k<l 

Consider the fourth derivative of S4, i.e. the polynomial in X,Yi, ■■■,Y^ 

G(X,Y u ...,Yt) = S4(X + Y T ) 

This polynomial corresponds to the 4-th Gowers Norm of S4, and as was shown in [GT07] and 
[LMS], it has bias 1/8. In particular, it cannot be independent (and so we would have defined 
A(5 4 ) to be at most 3). 

The strong regularity for polynomials that we define here might find some future applications. 
2.2 Organization 

The rest of the paper is organized as follows. We define required notation in Section [3l We define 
and analyze regular and strongly regular factors in Section SJ We show that strong regularity 
implies almost independence in Section [5j We prove Theorem [2] in Section El 
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3 Preliminaries 



F if a fixed prime field. We work with constant degree polynomials over F n . We denote by capital 
letters X, Y, ... variables in F n , and by small letters x, y, a, ... values in F n . Degree of a polynomial 
will always mean total degree. Unless otherwise specified, when we speak of a degree d polynomial, 
we mean in fact a polynomial of total degree at most d. For a set of variables Yi, Yi,--- £ F n we 
denote by Yj = ^2 i€l Yi, and similarly for a set of values y\,y2,--- G F n . We write u = v(l ± e) 
for u £ [v(l — e),u(l + e)]. When we speak of a growth function, we mean any monotone function 
T : N -» N (for example, F{n) = 2™ 2 ). 

Definition 7 (Derivative space of a polynomial). For a polynomial f(X), we define its derivative 
space to be the set 

Der(f) = {f(X + a) - f{X) : a e ¥ n } 
Notice that if deg(/) = k then all polynomials in Der(f) have degree at most k — 1. 

4 Regularity of polynomials 

As we discussed in the introduction, the notion of regularity plays a major rule in our proof. Green 
and Tao in [GT07] suggested one notion of regularity (we refer to it henceforth as regularity) which 
limited their proof to work only for large fields (i.e. d < |F|). We suggest a stronger notion of 
regularity (noted henceforth as strong regularity) . This new notion of strong regularity is essential 
for obtaining a result for general fields. In the following we review the regularity definitions given by 
Green and Tao. Then, we present the notion of strong regularity and show that every constant factor 
that approximates a polynomial p can be transformed into a constant factor that approximates p 
and is also strongly regular. We end this section by showing that strong regularity implies almost 
independence for sets of variables that forms some specific structures. This almost independence 
is the crux of the proof of Theorem [2J 

Definition 8 (Regularity of polynomials). Let T be any growth function. A set of polynomials 
{gi, ...,g m } is called ^"-regular if any linear combination a\gi{X) + ...a m g m (X) cannot be expressed 
as a function of at most !F{m) polynomials of degree k — 1, where k = max{deg(<7i) : oti / 0} (i.e. 
k is the maximal degree of gi appearing in the linear combination). 

Green and Tao also define the notion of a refinement of a set of polynomials. Informally, a set 
{gi, ...,g m } is a refinement of {fi, f s } if fo r any i £ [s], fi(x) can be computed given the values 
of {gi(x), ...,g m (x)}. 

Definition 9 (Refinement). A set of polynomials {gi, g m } is a refinement of {fi, fs} if f° r 
any i G [s] there exists a function Fi : F m — > F s.t. 

fi(X)=Fi( gi (X),...,g m (X)) 

Green and Tao prove that for any growth function T ', any set of polynomials F = {fi, f s } can 
be refined to a ^"-regular set {gi, ■■, g m }, s.t. m depends only on s, T and the maximal degree 
in F. Importantly, m is independent of n. Green and Tao proof start by a set {/i, ...,/ s } which 
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approximates p(X) weakly, transform it to a set which approximates p on almost all points, then 
use the regularity condition to show that it must in fact compute p exactly. In order to prove this, 
they need to analyze the joint distribution of 

{ 9i {X + Y,Yt) :ie [m],JC [D]} 
tei 

where D = 0(d), X, Yi, Yd € ¥ n are independent variables. Lets denote by Yj = J2iei They 
prove that if we just look on the subset 

{ 9i (x + Y I ):i€[m],IC [D], \I\ < deg( 9i )} 

for any x G F n , then these variables must be almost independent, and for any |/| > deg(^), 
Qi{x + Yj) is determined by { 9 i(X + Yj) : J C I, \J\ < deg(gi)}. Since the regularity requirement 
was for evaluations {gi(X) : i 6 [m]} they needed to use integration over F to get the independence 
result for evaluation on hypercubes, which limited their proof only to d < char(¥). 

We follow a similar approach, but in order to allow for d > char(¥), we allow more freedom in the 
set of variables which are almost independent or fixed given the others. For a set of polynomials 
gi,-.-,g m , we also have a "independence degree" A for every g, L (which is between 1 and deg(^)). 
Instead of requiring as Green and Tao do that: 

{ gi (x + Y I ):i€[m],IC [D], \ I\ < deg( 9j )} 

are almost independent, we demand only that 

{giix + Yj) :i€ [m],JC [D], \I\ < A( 9j )} 

are almost independent. However, we also demand that for any i £ [m] and |/| > A(^), the value 
of gi(x + Yj) can be determined by {gj(x + Yj) : J C I, \J\ < A(gj)}. Notice that we allow all 
polynomial gi, ■■■■,g m to participate in the calculation of gi(x + Yj), in contrast to the definition 
of Green and Tao which required only evaluations of the same g^. It turns out that this stronger 
notion of independence allows us to get almost independence between the variables, without the 
need of integration, which makes our result work for every field. 

We now move to formally define our notion of strong regularity, and to show it implies the almost 
independence/total dependence structure we have just described. We first extend the definition of 
a derivative space to several polynomials in several variable sets. 

Definition 10 (Derivative space). For a set of polynomials F = {fi(X), f s (X)} we define: 

Der(F) = {U{X + a) - h{X) : i G [s], a € ¥ n } 

Similarly, for a set of polynomials in several variables F = {/i(Yi, Yfc), f s (Yi, (Yi, Y^ G 
F n ) we define: 

Der(F) = {f^ + 01, ...,Y k + a k ) - fi(Y±, ...,Y k ) : i G [s], a u a k £ ¥ n } 

Notice that if the maximal degree of polynomials in F is k, then the maximal degree of polynomials 
in Der(F) is at most k — 1. We now define strong regularity. 



9 



Definition 11 (Strong regularity of polynomials). Let T be any growth function. Let G = 
{dii ■■■j 9m} be a set of polynomials and A : G — > N be a mapping from G to the natural numbers. 
We say the set G is strong ^"-regular with the degree bound A if: 

1. For any i £ [m], 1 < A(&) < deg(ffi). 

2. For any i £ [m] and r > A(<7j), let X and Y\,Y2, ■■■,Y T be variables in F n . There exist a 
function F^ r s.t. 

+ F [r] ) = F ijr ( gj (X + y» : j e [m], J C [r], | J| < A(#)) 

3. For any r > 0, let X and Yi,...,Y r be variables in ¥ n . Let {aj,/}i e [ m ] i /c[r],|J"|<A(g i ) be any 
collection of field elements, not all zero. Let a(X, Yi, Y r ) stand for the linear combination: 

a{X,Yx,...,Y r )= J2 ctij^X + Yj) 

t6[m],/C[r],|/|<A(fli) 

Let G' C G be the set of all gis which appear in a, i.e.: 

G' = { gi eG: 31 at,/ ^ 0} 

There does not exist polynomials hi,..., hi E Der(G'), I < Tim) s.t. a(X, Yj., Y^.) can be 
expressed as: 

H{h 1 {X + Y h ),...,h l {X + Y Il )) 
for C [r] and some function : F' — > F. 

If the set G satisfies only (1) and (2), we say G is pre- strong-regular (notice that T appears only 
in (3)). 

We first prove, similar to the proof in [GT07], that any set of polynomials can be refined to a strong 
^-"-regular set, where the size of the resulting set depends only on the size of the original set, and 
the maximal degree of polynomials in it. Also, the refining set is contained in the space of iterated 
derivatives of the original polynomials. 

We now formally define the space of iterated derivatives. 

Definition 12 (Space of iterated derivatives). For a polynomial set F, we define its iterated 
derivative set Derc to be the set of taking at most C derivatives of F, i.e. 

Der (F) = F 

Der c (F) = Der(Der C -i(F)) U Der C -i{F) 

Lemma 5 (Strong-Regularity Lemma). Let T be any growth function. Let F = {fx, f s } be a 
set of polynomials of maximal degree k. There exist a refinement G = {gi, ...,g m } of F s.t. 

1. The maximal degree of polynomials in G is also at most k 

2. The set G is strong ^-regular. 
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3. The size m of G is a function of only T , s and k. Importantly, it is independent of n. 

4. There exists C = C(JF, s, k) s.t. G C Der c {F) 

Proof. We will start by defining a pre-strong-regular set G from F, and will keep refining it until 
we reach a strong ^"-regular set. Our set G will also be in Deri(F) at the i-th iteration. We will 
finish by showing that the refinement process must end in a finite number of steps. 

We start by defining A : F — > N by A(/j) = deg(/j), and set the initial value of G to be F. To show 
that the initial G is pre-strong-regular with the degree bound A, observe that for any r > deg(/j), 
deriving /j r-times yields the zero polynomial. Thus, if Y±, Y r are variables, we have the identity: 

fi(X + Y [r] ) = (-l) r - ]Il+1 fi(X + Y T ) 

IC[r] 

Since we can do this for any r > deg(/j), we can continue and express fi(X + Yj r ]) as a linear 
combination of {fi(X + Yj) : I C [r], |7| < deg(/j)}. Thus, G is pre-strong-regular with the degree 
bound A. 

We will continue to refine G as long as it is not strong ^"-regular. Assume G = {gi, g m } 
at some iteration is not strong-jF-regular. By definition, there is some r > and coefficients 
{ a i,i}ie[m],ic[r],\i\<A( 9i ) s.t. the linear combination: 

a(X,Y 1 ,...,Y r )= J2 <XiM X + Y i) 

i6H,/C[r],|/|<A( Si ) 

can be expressed as a function of I < F(m) polynomials hi, hi G Der(G'), where G' = {i G [m] : 
3/ ctij / 0} is the set of all ^'s participating in the linear combination. 

Let gi be a polynomial of maximal degree k in G' and let Iq be a maximal I in respect to inclusion 
s.t. (Xi j / 0. Notice that we must have that |7o| < ^(ffi )- We have: 

J2 a ij9l (X + Yj) = H(hi(X + Y Jl ),...,h l (X + Y Jl )) 

ieM,/C[r],|/|<A( Si ) 

for some function H : ¥ l -> F. 

Notice first that deg(/ij) < k — 1 for all t £ [Z]. Substitute in the expression Yi = for all 
i £ Iq. We get that gi (X + Yj Q ) can be expressed as a function of {<?j () (X + Yj) : J C 7 }) 
+ yj) : 7 ^ i, J C I , j J| < A( 5j )} and {^(X + Yj) : J C 7 , | J| < deg(^)}. Thus, if we 
add the polynomials /ii, to G (and set A(/ij) = deg(/ij)), we can reduce A(^) to |/o| — 1- If 
we reduced it to zero, we can remove gj entirely from G. The resulting G will be our set for the 
next iteration. 

In order to prove that the refinement process ends after a finite number of iterations (depending 
on the initial size of F and its maximal degree), notice that at each iteration, the sum of A(g>j) for 
all gi G G with some degree d' reduces by at least 1, where the new polynomials added are all of 
degree strictly smaller than d', and their number is bounded (as a function of T and the size of G 
at the beginning of the iteration). So the total number of iterations is some Ackerman-like function 
of the initial number of polynomials, their maximal degree and the growth function T . □ 
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5 Almost independence by strong regularity 

In the following we prove that strong regularity induces almost independence/total dependence 
structure over general sets of variables. The following lemmas are the main technical building 
blocks in the proof of Theorem [2j 

We start by proving a lemma correlating applications of gi on sums below the degree bound A to 
all sums over a set of variables. 

Lemma 6. Let G = {gi, g m } be a strong-regular set with degree bound A. Let x,x' G F n be 
two points s.t. gi{x) = giix') for all i G [m]. Let yj ■■■,y' k G F n be values for some k > 1, and let 
Y±, Yk G F" be k random variables. Then the following two events are equivalent: 

1. A= [gi(x + Yi) = gi(x' + y'j) for all i G [m] and L C [k]] 

2. B = [gjx + Yi) = gi{x' + y'j) for all i G [m] and L C [k] s.t. 1 < |J| < A(^)] 

Proof. It is obvious that if A holds then also B holds. Assume that B holds, i.e. that 

9i(x + Yi) = gi {x' + y' i ) 

for all i G [m] and / C [A;] s.t. |/| < A(^). Take some I s.t. I > A(^). We need to show that also 
gi(x + Yj) = gi(x' + y'j). Since |/| > A(gi) we know by the strong regularity of G that there is a 
function Fij s.t. 

9i (X + Yj) = F itI ( 9j (X + Yj) : j G [m], JQL, \J\ < A( 9j )) 

By first substituting X = x to compute g(x + Yj), and then substituting X = x' and Yj = y'- to 
compute g(x' + y'j), and using that both gj(x) = gj(x') for all j G [m] and the assumption that B 
holds, we get that also gi(x + Yj) = gi(x' + yj). □ 

We now prove a lemma showing that points which are sum of at most &(gj) points for all gi are 
simultaneously almost disjoint, provided that J- is large enough. Remember that we are in the 
process of proving Theorem [2] for degree d by induction. Thus, we assume it to hold for all degrees 
d' < d, and in particular to all linear combinations of gi,...,g m . 

Lemma 7. Let 7 = 7(771) be an error term. Let Y\, Yk G F n be random variables for some k > 1. 
Assume T is large enough (as a function of 7 and k). Assume g\, .:, g m o,re strong T-regular with 
degree bound A. For any non-empty L C [k] let xi G F n be some point, and = (a[ T \ a^) G F fc 
s.t. 

• a f ) j- for alii el 

• a f ) = for alii $ I 
Then the joint distribution of 

\9i{xi + J2 a i I)Y i) ■■ * G N» 1 ^ M> 1 < l J l < A (^)^) 
is 7-c/ose to t/ie uniform distribution on ]F^i=i^j=i Vj/. 
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Before proving Lemma we give an immediate corollary of it and Lemma [6) 

Corollary 8. Let x,x' G F n be two points s.t. gi(x) = Qi{x') for all i G [m]. Let y[, ■■■,y' k G F n be 
values for some k > 1, and let Y±, G F n 6e random variables. Then 

¥[g i (x + Y I )=g i (x' + yj)y ie[m], I C[k}]=\¥\ *-<=i^=i W(l± 7 ) 

We need the following simple lemma for the proof of Lemma UJ It states that a random derivative 
of a biased polynomial is also biased. 

Lemma 9. Let h(Y\, Y k ) be a polynomial with bias 5. Let h! be the derivation of h in variables 
Y\,...,Y r by directions Z\,...,Z r , (r < k) i.e. 

h'(Y u ...,Y k , Z u Z r ) = (-l) H M^i + fiZi, Y r + w r Z r ,Y r+1 , ...,Y k ) 

we{o,i} r 

where \w\ denotes the hamming weight of w. Then bias(h') > 5 2T . 

Proof. We apply Cauchy- Schwartz. It's enough to prove for k = 2 and r = 1 because we can group 
variables. 

bias(h') =E Yl ,Y 2 ,z 1 &»[u h(Yl ' Y2) - hiYl+Zl ' Y2) ] =Ey 2eF n[(Ey ieF n[u;Mn,y 2 )^ 2 ] > 
(E yi ,y 2eF n[^^)]) 2 = 5 2 

□ 

Proof, (of Lemma [7J) We start by using the well known fact, that if a distribution over F r is not 
uniform, it must have some biased functional. If the distribution we study is 7-far from uniform, 

then there must be a linear functional on {gi(xi + Yliei a i ^») • * ^ [ m li ^ c Mi 1-^1 — ^Mfi 1 *)} 
with some non-negligible bias depending on 7. We will prove that if we assume that, we reach a 
contradiction. 

Denote by Y[ = ^,iel a i anc ^ n °tice it depends on exactly the same set of variables from 
Yi, Yfc as Yj. By our assumption, there exist coefficients not all zero, s.t. the polynomial 

h(Yx, Y k ) = OH,l9i(xi + Yj) 

ie[m],IC[k],\I\<A(gi) 

has bias at least p, where p is a function of 7, k and m only (and not of n). 

Fix Jo maximal with regards to inclusion s.t. not all an are zero. Since we just care about the 
bias of h under random Y\, ...,Y k , we can multiply each Yi by some non-zero coefficient. We thus 
assume w.l.o.g that = 1 for all i G Iq. Let |/o| = r - We assume w.l.o.g that Iq = {1,2, ...,r}. 
Notice that Y/, = Yj r ]. We also shorthand x = X[ r y 

Let be a polynomial with maximal degree d" < d' < d s.t. aj ./ 7^ 0. 

We derive now once each of the variables in Yi, Y r . Let {^}i=i.. r be new variables in F™, 
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and consider: 

h'(Yx, Y k , Zi, Z T ) = (-l) H M^i + wiZi, -> Yr + w r Z r ,Y r+ll Y k ) 

we{o,i} r 

First, by Lemma [9J h! has bias at least p' = p 2k . 

Now, consider what happens to a term gi(x + Yf) in h after the derivation. If / ^ [r], by the 
maximality of Jo there must exist i! £ [r] s.t. i' ^ /. Thus, deriving Yy zeroes out gi(x + Y/). 
So, the only terms remaining in b! come from terms in h of the form gi(x + Y| r ]). Thus, /i' does not 
depend on Yi for i ^ [r], and also all the g^s remaining must have A(gi) > r (because g%{x + Ym) 
appeared in g with non-zero coefficient). Thus we can write: 

ti = h'(Y 1 ,...,Y r ,Z 1 ,...,Z r )= Y, a i\r] E (-l) H 9i{x + Y [T] +Z w ) 

i£[m] u>C[r] 

We now make an important observation. Notice that /i' depends only on the sum Yf r i, and not on 
the individual Yi, Y r . So we can substitute W = x + Yf r i and get: 

h! = h'(W, Zi, Z r ) = £ a i)[r] £ (-l)H 5i (W + Z„) 

i€[m] «;C[r] 

We have assumed that G is strong ^"-regular. We will show now that if we choose J- large enough, 
we have already reached a contradiction. Notice the polynomials gi(W + Z w ) are exactly those 
which appear in the regularity requirements ( where X is replaced here by W, and Y\,Y2,... by 
Zi,Z2, ...)• Let G' denote the set of g^s s.t. gi appear in b! with non-zero coefficient. 
We assume by induction that Theorem [2] holds for d" < d and for all n. Since all polynomials 
gi £ G have degree at most d—1, then also deg(/i') < d— 1, and so we can apply Theorem[2]on b! . 
So, since h' has bias p', there must exist polynomials q%, ...,qt £ Der(h') s.t. 

Z l5 Z r ) = Q(qi(W, Z Xl Z r ), Z l5 Z r )) 

for some function Q : F* ^ F, s.t. t = t(p', d"). Moreover, since every polynomial qi is of the form 
h'{W + ao, Z\ + ai, Z r + a r ) — fo'(W, Zi, Z r ) for some constants ao, a r € F ra , and /i' is the 
sum of gi(W + Z w ), we can decompose each q^ to a sum of at most 2 r polynomials of the form 
gi(W + Z w + a) - gi(W + Z w ) e Der(G') for u> C {0, l} r . Let gj, q' t , e Der(G') denote these 
decomposed polynomials. We thus have that: 

tf(w; z l5 Z r ) = Q'(q[(W + Z /; ), ...,<£(w + z Vti )) 

for some function Q' : F*' — > F, t' = 2 r t and C [r]. We got that we can compute 

h'(W,Z u ...,Z r ) = £ «i,H E (-i) w fl<(W r + ^) 

ig[m] «;C[r] 

as a function of t' polynomials of degree strictly smaller than d" . If we have J-{m) > t' this is a 
contradiction to the strong ^"-regularity of g\ , . . . , g m . 

Summarizing, there can be no linear combination of {gi(x + Yj) : I £ S,l < \I\ < A(gi)} which has 
bias more than p, and so the distribution is 7-close to uniform. □ 
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6 From approximation to calculation: proof of Theorem [2] 

In this section we prove Theorem [2j The main technical tool that we will use are Lemmas [6] and [7J 
Let p(X) stand for a degree d polynomial with bias 5. The proof of the theorem is immediate given 
the following two lemmas. The first lemma (Lemma I10p asserts that a biased degree d polynomial 
can be approximated by constant many degree d— 1 polynomials. This lemma was useful also in the 
proof of Green and Tao and its proof appears in [BV] . The second lemma (Lemma II ip asserts that 
approximation by few degree d— 1 polynomials imply computation by few degree d— 1 polynomials. 
In the following we present the two lemmas. 

Lemma 10 (Bias imply approximation by few lower degree polynomials). Letp(X) be a polynomial 
of degree d with bias 5. For any e > there exist polynomials f\{X), ...,f s (X) of degree at most 
d — 1 and a function F : F s — > F s.t. 

F X& n[F(f 1 (X),...,f 8 (X)) ^p{X)] < e 

The number s of the polynomials depends only on S and e. Moreover, fi,-.,f s £ Der(p). 

The full proof can be found in [BV] (Lemma 24). The proof idea is that a random derivative in 
direction a approximates p{x) for any x, and so taking a majority value over enough random values 
of a's (but still a constant number) allows to compute p on all but a e-fraction of the points. 

Lemma 11 (Approximation by few lower degree polynomials imply computation by few lower 
degree polynomials). Let p(X) be a polynomial of degree d, f\, polynomials of degree d — 1, 
(s = 0(1) ) and H : ¥ s — > F a function s.t. the composition H(fi(X), f s (X)) e^- approximates p, 
where ed = 2~ n ^ Then there exist s' polynomials /{,...,/'/ and a function H' :¥ s — > F s.t. 

H'(f[(X),...j' s ,(X))=p(X) 

Moreover, s' = s'(d,s) (i.e. independent of n) and each f[ if of the form p{X + a) — p(X) or 
fj(X + a) forae ¥ n . 

Thus, to complete the proof of Theorem [21 it remains to prove Lemma [TTJ 

We start the proof of Lemma [TT1 by refining F = {fi, f s } to a strong-regular set. Let T be a large 
enough growth function (to be determined later). By Lemma [5] there exists a set G = {gi, g m } 
refining F, and a degree bound A, s.t. G is strong ^"-regular with degree bound A. Moreover, 
there exists a C = C{!F,5,d) s.t. G C Derc(F). We know that G also approximates p(X) at least 
as well as F does. We will prove that it is in fact computes F completely. We can then decompose 
each gi G Derc(F) as a sum of at most 2 C elements in Der(p) to conclude the result. 
Thus, we need to show that G in fact computes p(X) completely. For c = (ci, c m ) £ F m , denote 
by R c ^ F n the region 

R c = {x e ¥ n : 9i {x) = a] 

To show that G computes p(X) is equivalent to showing that p(X) is constant on any region R c . 
Thus, we turn to study the regions R c . 

We first show (Lemma [12]) that all regions R c have about the same volume, i.e. that they form 
an almost uniform division of F n to F m regions. Since G is a strong regular refitment of F that 
e^-approximates p we know that also G e^-approximates p, i.e. there exists some H' : F TO — ► F s.t. 

P Xe¥ n[H'( gi (X), ...,g m (X)) ^ p(X)} < e d 



15 



For every region R c , let n c be the probability that p is different from G on that region {G is constant 
on the region). 

r 1c = ¥ XeR MX)^G\ Rc ] 

Since the average of r) c is at most e<7, and all regions are almost uniform (Lemma fTZI) there can 
be at most y^Fl" 1 regions on which r/ c > ^fk~d. We call these the bad regions, and we call the 
rest of the regions almost good regions. Next we show (Lemma that the almost good regions 
are totally good and p is fixed on them. Last, we use the fact that there are only few bad regions 
and p is fixed on the rest to conclude that p is also fixed on the bad regions (Lemma I15p . Thus, 
p(X) is in fact constant on all regions. To complete the proof of Lemma [TTT it remains to prove 
Lemmas 121 H and [El 

Lemma 12 (Regions are uniform). Let 7 = 7(777.) > be a small enough error term. If T is large 
enough than 

\R C \ = |F| n ~ m (l± 7 ) 

for all c E F m . 

Proof. Let c E F m and assume first that R c is not empty, i.e. there exist some x s.t. g%{x) = Ci for 
all i E [m]. We apply Corollary [8] with k = 1, x' = x and y\ = and get: 

FyMx + *i) = gi (x), V % E [m]} = \¥\' m (l ± 7) 

Substituting Y = x + Y\ proves the result for R c . 

To show the there can be no empty regions, assume otherwise. Thus, there are at most |F| m — 1 
non-empty cells, and each has volume at most |F| n_m (l + 7). Thus (|F| m — l)|F| n_m (l + 7) > |F| n . 
If 7(771) < |F|~ m we get a contradiction. Thus, there are no empty regions, and so all regions have 
volume |F| n - m (l ±7). □ 

Lemma 13 (Almost good regions are good). Let R c be a region s.t 

¥ X€R MX)=b]>l-2- 2 ^ 

for some constant b E F. Then p{X) = b for all X E R c . 

Before proving the lemma we need the following counting lemma on the number of hypercubes and 
pairs of hypercubes inside a region, similar to one in [G7T07j. However, our technique avoids the 
need of integration. 

Lemma 14. Let 7 = 7(777) > be small enough error term, and assume T is large enough. For 
any point R = R c and a point x E R we have: 

1. Let Y\, Yrf+i be variables in F n . Then: 

VY 1 ,...,Y i¥1 &»[x + Y I €R,\/lQ[d+l]] = \F\ ^=1^=1 UJ(1± 7 ) 

2. Let Y\, Yd+i, Z\, Z^+i be variables in F n . For any non-empty Iq E [d + 1]: 

F Yl ,...,Y d+1 ,z ly ..,z d+1 &n [x + Y I eR,x + Z I eR,VIc [d + l]|y Jo = Z h ] < 

( ^ m -s^ A (fi) ( d + 1 \\ 2 

|F| m |F| -i -*=i^i=i y * )) (1 + 7) 
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Proof. 1. This is a direct application of Corollary [8] for k = d + 1, x' = x and y\, yt = 0. 

2. Assume w.l.o.g that Iq = {1,2, s} for 1 < s < d + 1. We start by making a linear 
transformation on the coordinates to bring Yj and Zj to a single variable. Let Y[ = Y{ for 
i ^ s and y/ = y + ... + Y s , and similarly define Z[, ...,Z' d+l . We write Yj in the basis of 
Y{,..., Divide I = I S U Is where I s = I D [s] and I s = I\I S . We have: 

. ifs^j, y = £ i6/ y/ 

. if , e /, y = y; - £ i6[s]Vs n + £^ y/ 

Consider for every I the set 7/ of indices of y/ which appear in the expansion of Yj. Notice 
that for any T C [d + 1] there is exactly one I s.t. Tj = T. In particular, in order that 
Qi{x + Yj) = giix) for all /, we must have in particular that: 

• For any i" C [d + 1] s.t. s£I and \I\ < A(&), 

5i(s + y/) = c/i(x) 

• For any I C [d + 1] s.t. s£j and |/| < A(^), 

+ y' ~~ */n[s-i] + */n{s+l,...,d+i}) = 
Similarly for the Z^'s, using the fact that the event Yj = Zj translates to Z' s = y,': 

• For any i" C [d + 1] s.t. s £ I and \I\ < A(^), 

giix + Z'j) = giix) 

• For any I C [d + 1] s.t. s G I and |/| < A(^), 

giix + y s - 2'/ n [ s _ 1 ] + z 'm{s+i,...,d+i}) = ftO 27 ) 

The probability of this event is an upper bound on our required probability. Since our 
variables 

V' V' 7' 7' 7' 7' 

are uniform and independent, we can apply Lemma [7] to show that its probability is the 
required upper bound. The number of subsets of size j > 1 in the above events is ( j^ 1 ) 

for the event on the Y"s, and also ( j" 1 ) for the event on Z[, Z' S _ 1 ,Y^, Z' s+1 , Z' d+1 . For 
j = 1 however we have intersection (y' is appearing twice), and so the number of events is 
2( rf ^" 1 ) — 1. Thus, by Lemma [7| the probability of the total event is: 

(^ m ^A( 9i ) fd+l\\ 2 
| F |-^=1^ = 1 { j )\ (1± 7 ) 

which upper bounds the required probability. 

□ 

We now prove Lemma [13] using Lemma Q31 We follow the same proof as in |GT07| . 
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Proof. Let B C R be the set of all "bad" points x G -R on which 7^ 6. By our assumption, 
|B| < 2 _2 ( <t+ - 1 )|i2|. Assume B is non-empty, and choose some x G B. Let Y\, Y^i be random 
variables in F n . Fix small enough 7 = 7(771). By Lemma [T4l (1), 

p fl = P[or + 17 G R, VIC [d+1]] > |F| -2 -*= l2 -i=i w J(l- 7 ) 

We now wish to bound the event that when all X + Yj are in i?, some X + Yj is in B, and then 
union bound over all possible I. 

We start by applying Cauchy-Schwartz to transform the problem to counting pairs of hypercubes. 
Fix some non-empty Iq C [d + 1] , and let 

p B =P[x + Yr G i2 V/ C [d + 1] A x + Y Io £B} = 

P[s + 1/ G VI C [d + 1] A 3 + Y /o = s ] 

x £B 

We need to upper bound ps- 

Pb=\ ^ P[z + Yr G i? VI C [d + 1] A x + Y /o =xq] < 
\aoS-B / 

\B\ P[x + Y 7 G i? VI C [d + 1] A x + Y/ = x ] 2 = 

|B|P[x + Yj G i? VI C [d + 1] A x + Z/ G R VI C [d + 1] A aj + Y /o = x + Z /o ] = 
|B||F| _n P[x + Yj eR, x + Z 7 G i?VIC [d + l]|x + Y Jo = a? + Z Jo ] 

where Z±, Z^+i are new variables in F n . 

By claim (2) in Lemma [14] we get that this probability is at most 

|S||F| ro - n j^(l + 7) 

By Lemmaia |-R| = \F\ n F X eF™[X G R] = |F| n - m (l ± 7). Thus, we have that: 

Pi < ||}p 2 b(1 ± 27) < 2-^p% 

and thus ^ < 2~ < - d+1 )(l ± 27). We can now union bound over all non-empty Iq C [d + 1]. The 
probability that there is some lo for which x + Yj G B is at most 

(2 d+1 - l)(2~( d+1 ) +7) < 1 

for small enough 7. 

Thus, there must exist y%, ■■■,yd+i G F n s.t. 

x + yi G i? \ B 
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for all non-empty / C [d + 1]. Equivalently, p(x + yj) = b for all such Fs. However, since 
P( X )yi,..,y d+ i = °> 

IC[d+l],\I\>0 

and so if all p(x + yi) = b, then also p(x) = b, hence x ^ B. So we have proved that B is empty, 
i.e. p is constant on R. □ 

We finish the proof of Theorem [2] by proving that if p(X) is constant over almost all regions, then 
it must be constant over any region. 

Lemma 15 (If almost all regions are totally good, all are totally good). Assume that the fraction 
of regions on which p is constant is at least 1 — 2~( d+2 \ Then p is constant over any region. 

Proof. Let R be any region, and x,x' £ R two points in R. We need to show that p(x) = p(x'). 
Choose y[, ...,y' d+1 £ F n randomly. The probability that x' + y\ falls in a bad region for any non- 
empty / C [fi+1] is 2~( rf+2 ) (since regions are almost uniform, see Lemma fT2j) . Thus, applying union 
bound over all non-empty / C [d + 1] we get that {x' + y'j} fall in good regions for all non-empty / 
with probability at least 1/2. Fix some y[, ...,y' d+1 fulfilling this requirement. 

Let Yi,...,Yd+x £ F n be random variables. Since gi(x) = gi(x') for all i G [m] we can apply 
Corollary 

P [gi(x + Yi) = gi{x' + yj) V i 6 [m], lC[d+l]] = |F| ^ i=1 w )(1 ±7) 

In particular, for small enough 7 we get that 

P [ 9i (x + Yj) = 9i (x' + y'j) V i € [m], I C [d + 1]] > 

Let yi, yd+i be such assignment to Yi, Y^+i. We thus have that for all non-empty I C [d + 1] 
and for all z 6 [m], ^(x + y/) = gi(x' + yC). Since the region of x' + y\ is good for all non-empty /, 
we get that for all non-empty I C [d + 1] , 

p(x + 2//) = p(x + y' x ) 

We now use the fact that p is a degree d polynomial. If we derive p d + 1-times in any direction, 
we will always get zero. We thus have that for x,y%, ...,yd+i £ F n : 

^ (-l)l 7 lp(x + y/ ) = 

JC[d+l] 

Since the same identity is true for x' ,y[, ...,y' d+1 , we get that p(x) = p(x'). □ 
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